Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: Allow hierarchical filtering of logger namespaces #22859

Merged
merged 7 commits into from
Jun 4, 2024

Conversation

ghoz
Copy link
Contributor

@ghoz ghoz commented May 31, 2024

When playing with the new log_namespaced_levels #22619, I expected that 'zh:zstack:unpi' :info would match all namespaces below, such as zh:zstack:unpi:parser , writer, etc

This PR implements a quick POC to allow that.
To limit runtime penalty it uses the same namespacedLevels as a lazy cache: the first time a new namespace is seen, it tries to find if any namespace above is defined in the config and stores the level, So in effect

advanced:
  log_namespaced_levels:
    'zh:zstack:unpi': info

Would also apply to zh:zstack:unpi:writer , zh:zstack:unpi:parser and anything else below zh:zstack:unpi I might not have seen yet. Of course if a more specific config is defined, it would be picked up.

This would allow to add new namespace sublevels without breaking existing configuration, for example z2m:mqtt:connexion and z2m:mqtt:publish

BUGS / TODOs :

  • The default log level cannot be dynamically changed , the cache would need to be reset when the level is changed at runtime
  • Fix logger tests as the namespacedLevels are now filled (no longer necessary)
  • Add new tests for the feature
  • There is most likely a nicer / more typescripty way for doing this ... expecting @Nerivec to replace that with a one liner (close enough)
  • Add a note in the documentation that the config applies to all namespaces below (or not even needed, that was my initial expectation) mention the hierarchical namespace for logging zigbee2mqtt.io#2786

@Nerivec
Copy link
Collaborator

Nerivec commented May 31, 2024

Doing any kind of looping in log, which can be called hundreds of times per second, will grow cost really fast (and block), at least until the cache is built (or re-built after any level change).
The simple 1:1 match on configured object keys limits that problem and still allows to customize as needed, albeit manually for "below namespaces"...
debug has namespace regex filtering also (that's the level most affected by deeper namespaces).

If this is really needed, will require a well optimized algorithm, and performance impact testing in various "spammy" scenario (startup, OTA, MQTT debug, etc).

@ghoz
Copy link
Contributor Author

ghoz commented Jun 1, 2024

looping in log, which can be called hundreds of times per second, will grow cost really fast

Most of those calls are using the same few namespaces, so after the first hits, the cost is the same as currently, 1 key lookup per call. I don't believe the number of entries in the cache would add any measurable cost to the lookup

I don't have the means of doing perf testing but even with that crude fallback when a new namespace is encoutered, I did not notice any slowdowns on my lowy setup.
After all we are not talking about 1000s of very deep namespaces, and z2m is most likely still starting up so I suspect the potential perf impact would be hidden in the noise...

Right now the debug regex approach would need to be implemented in info level to allow spliting mqtt connection/login info messages with the flood of mqtt updates ... It would be interesting to compare the regex match cost, that is incured on every log call this time.

We have really nice new hierarchical namespace, we might as well use it 😋

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 1, 2024

The cost once the cache is built would not be the problem. Actually, once it's built, it's probably a tiny bit faster.
But I suspect issues might arise when a cache rebuild is triggered. Performance is likely to drop 5-10x (even if just for a short time). If this is done during a heavy operation, I wouldn't be surprised if that crashes z2m, especially on low-resource machines.
Needs to be investigated further.
It also might be counterproductive in some scenarios for devs wanting to suppress higher-level logic but keeping lower one (granted, debug regex is likely to be more fitting here).

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 1, 2024

Ran some quick & dirty benchmarks for reference, simple ones on just the differences in logic involved (also threw in regex one for good measure since it's adjacent to this).
The benchmarks only have the base logic (filtering), no transport, no logging at all, just adding non-filtered messages to an array local to each test to have a minimal equal cost.

Current:

  • for loop x1M: 189.161ms
  • Benchmark.js: 6,220,255 ops/sec ±0.94% (89 runs sampled)

Looping (cache built once):

  • for loop x1M: 162.943ms
  • Benchmark.js: 6,868,936 ops/sec ±0.39% (90 runs sampled)

Current / Regex:

  • for loop x1M: 216.387ms
  • Benchmark.js: 4,832,131 ops/sec ±2.18% (87 runs sampled)

Current with object assign cost (just to compare below rebuild without this cost / not significant):

  • for loop x1M: 196.371ms
  • Benchmark.js: 5,528,133 ops/sec ±0.56% (90 runs sampled)

Looping with rebuild:

  • for loop x1M: 652.21ms
  • Benchmark.js: 1,625,674 ops/sec ±0.61% (90 runs sampled)
Code
const LOG_LEVELS: readonly string[] = ['error', 'warning', 'info', 'debug'] as const;
type LogLevel = typeof LOG_LEVELS[number];
const LEVEL = 'info';
const namespaces: string[] = [
    'zh:zstack:unpi:parser',
    'zh:zstack:unpi',
    'zh:zspec',
    'z2m',
];
let logger1NamespacedLevels: {[ns: string]: string} = {
    'zh:zstack:unpi:parser': 'warning',
    'zh:zstack:unpi': 'warning',
};
let logger2NamespacedLevels: {[ns: string]: string} = {
    'zh:zstack:unpi': 'warning',
};

const log1 = (level: LogLevel, message: string, namespace: string, picked: string[]): void => {
    const nsLevel = logger1NamespacedLevels[namespace] ?? LEVEL;

    if (LOG_LEVELS.indexOf(level) <= LOG_LEVELS.indexOf(nsLevel)) {
        picked.push(message);
    }
};

const assignLevel = (namespace: string): string => {
    let ns = namespace;

    while (true) {
        const sep = ns.lastIndexOf(":");

        if (sep === -1) {
            return logger2NamespacedLevels[namespace] = LEVEL;
        }

        ns = ns.slice(0, sep);

        if (logger2NamespacedLevels[ns]) {
            return logger2NamespacedLevels[namespace] = logger2NamespacedLevels[ns];
        }
    }
};
const log2 = (level: LogLevel, message: string, namespace: string, picked: string[]): void => {
    const nsLevel = logger2NamespacedLevels[namespace] ?? assignLevel(namespace);

    if (LOG_LEVELS.indexOf(level) <= LOG_LEVELS.indexOf(nsLevel)) {
        picked.push(message);
    }
};

const log3DebugNamespaceIgnoreRegex = new RegExp(/^zh:zstack:unpi.*/);
const log3 = (level: LogLevel, message: string, namespace: string, picked: string[]): void => {
    if (log3DebugNamespaceIgnoreRegex?.test(namespace)) {
        return;
    }

    const nsLevel = logger2NamespacedLevels[namespace] ?? LEVEL;

    if (LOG_LEVELS.indexOf(level) <= LOG_LEVELS.indexOf(nsLevel)) {
        picked.push(message);
    }
};

console.log(namespaces);
console.log(JSON.stringify(logger1NamespacedLevels));
console.log(JSON.stringify(logger2NamespacedLevels));
console.log(log3DebugNamespaceIgnoreRegex);

let current = 0;
let looping = 0;
let regex = 0;
let currentRebuild = 0;
let loopingRebuild = 0;

const benchCurrent = (): void => {
    const picked: string[] = [];
    log1('info', 'Test1-0', namespaces[0], picked);
    log1('info', 'Test1-1', namespaces[1], picked);
    log1('info', 'Test1-2', namespaces[2], picked);
    log1('info', 'Test1-3', namespaces[3], picked);
    current += picked.length;
};
const benchLooping = (): void => {
    const picked: string[] = [];
    log2('info', 'Test2-0', namespaces[0], picked);
    log2('info', 'Test2-1', namespaces[1], picked);
    log2('info', 'Test2-2', namespaces[2], picked);
    log2('info', 'Test2-3', namespaces[3], picked);
    looping += picked.length;
};
const benchRegex = (): void => {
    const picked: string[] = [];
    log3('info', 'Test3-0', namespaces[0], picked);
    log3('info', 'Test3-1', namespaces[1], picked);
    log3('info', 'Test3-2', namespaces[2], picked);
    log3('info', 'Test3-3', namespaces[3], picked);
    regex += picked.length;
};
const benchCurrentRebuild = (): void => {
    logger1NamespacedLevels = {
        'zh:zstack:unpi:parser': 'warning',
        'zh:zstack:unpi': 'warning',
    };
    const picked: string[] = [];
    log1('info', 'Test10-0', namespaces[0], picked);
    log1('info', 'Test10-1', namespaces[1], picked);
    log1('info', 'Test10-2', namespaces[2], picked);
    log1('info', 'Test10-3', namespaces[3], picked);
    currentRebuild += picked.length;
};
const benchLoopingRebuild = (): void => {
    logger2NamespacedLevels = {
        'zh:zstack:unpi': 'warning',
    };
    const picked: string[] = [];
    log2('info', 'Test20-0', namespaces[0], picked);
    log2('info', 'Test20-1', namespaces[1], picked);
    log2('info', 'Test20-2', namespaces[2], picked);
    log2('info', 'Test20-3', namespaces[3], picked);
    loopingRebuild += picked.length;
};

// for (const bench of [benchCurrent, benchLooping, benchRegex, benchCurrentRebuild, benchLoopingRebuild]) {
//     console.time(bench.name);
//     for (let i = 0; i < 1000000; i++) {
//         bench();
//     }
//     console.timeEnd(bench.name);
// }
// console.log(current, looping, regex, currentRebuild, loopingRebuild);

import Benchmark from 'benchmark';
const suite = new Benchmark.Suite();

suite.add('Current', benchCurrent);
suite.add('Looping', benchLooping);
suite.add('Regex', benchRegex);
suite.add('Current with object assign cost', benchCurrentRebuild);
suite.add('Looping with rebuild', benchLoopingRebuild);

// eslint-disable-next-line @typescript-eslint/no-explicit-any
suite.on('cycle', function(event: any) {
    console.log(String(event.target));
});

suite.on('complete', function() {
    console.log(JSON.stringify(logger1NamespacedLevels));
    console.log(JSON.stringify(logger2NamespacedLevels));
    console.log(current, looping, regex, currentRebuild, loopingRebuild);
    // @ts-expect-error bench
    console.log(this.filter('successful').map((b) => b.count));
});

suite.run({'async': true});

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 1, 2024

That last commit forces the use of the loop, definitely worse for performance.

@ghoz
Copy link
Contributor Author

ghoz commented Jun 1, 2024

Thank you for taking the time to bechmark test this out... we really don't have the same definition of "quick and dirty" 😅
To be honest looking at your results, what surprises me most might be that the results with cache version is~10% faster than the current setup of using the default level if not found...

The cache build/rebuild situation would happen only at startup and level config change, and only on the first occurence of a namespace...
With the numbers, do you still think that the x3 hit that the first log of the namespace would take could be too slow and hang/crash a loaded low end machine ?

My sample of 1 is my current setup: a small /medium network of ~60 devices on a pi 3, that also hosts a small HA and a couple of other services ... not the beefiest setup but so far so good

@ghoz
Copy link
Contributor Author

ghoz commented Jun 1, 2024

That last commit forces the use of the loop, definitely worse for performance.

It might be better to use the cache indeed 😄

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 1, 2024

It might be better to use the cache indeed

The spread operator is a no-go here too I think. It always has terrible performance for some reason. Object.assign(target, source) likely tons better.

what surprises me most might be that the results with cache version is~10% faster than the current setup of using the default level if not found...

It doesn't fallback (??) once the cached is filled. Seems to be exacerbated in this scenario...

This is likely to be a problem if triggered (a level is changed, cache needs rebuilding) while Z2M is already having a hard time keeping up with inc/out (during an OTA or on a network with a bunch of terrible spammy Tuya... the usual troublemakers), since this will affect lower levels the most, and serial comm is the most chatty.
Only the filtering is being benchmarked here, so ops/sec are pretty high, but with actual logging/transports, it's going to drop/block very suddenly, very hard (that's nearly a 4x drop without anything else running beside it), in scenarios that are already dicey on their own...
I'd say, at best, it increases the chance of things going wrong too much... That, versus adding a couple of lines to manually cover below levels (since this is likely to be used 98% of the time only to silence serial comm while in debug)... I'd pick the manual labor 😁

@ghoz
Copy link
Contributor Author

ghoz commented Jun 2, 2024

I'm probably missing some context...
I have a zstack based adapter it uses very few low level namespaces... Does the ember use a lots ?

Even then from my understanding, during a heavy OTA, the namespaces used will just repeat and hit the cache, only the first handful of serial /ota /whatever tuya will get the namespace search hit... and the rest will be very slightly sped up...

And as you said the benchmark is only of the filtering, when adding the full transport overhead, i'm assuming the difference would even be less...

I do have a bunch of the chatty tuya power consumption stuff, but no OTA ready to test this...

On my side, I want to silence namespaces branches without having to go hunt for each new subnamespace 😬
And ideally being able to see the mqtt connection info but not the flood of updates (I was one of the few liking the debug level for those 😁)

Is anyone with a big network and some devices to update on a very ressource constrained machine willing to do some tests 😅 ?

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 2, 2024

Have you tried changing a level during a time of chatty behavior (or maybe try changing the log_level in Logs page at a fast pace, to stress-test a bit)? What I'm more worried about is that instant where the cache starts to rebuild and it's going to block each time a new namespace comes up until all are attributed a level (and the first few are likely to block the most). But it could just be me being paranoid 😬
Between zh and zhc, there are probably around 200 namespaces (although several are device-specific, so most people will only ever see a portion of that). You have to consider various scenarios like this one:

advanced:
  log_level: info
  log_namespaced_levels:
    zh: debug
    zhc: debug

You're right though, the cost of the extra logic is probably drowned in the relative expensiveness of the logging itself, the difference won't be less, but hidden because of the drastically lower ops/sec.
I'll try to run some tests with real logging to confirm.

@Nerivec
Copy link
Collaborator

Nerivec commented Jun 2, 2024

With the commit I just added, it looks like this.

Since I changed benchmark tool, I figured I should post the previous stuff back for reference.

Task Name ops/sec Average Time (ns) Margin Samples
Current '2,811,858' 355.6367133925188 '±0.16%' 14059291
Looping '1,728,545' 578.5211784473531 '±0.22%' 8642726
Looping with separate cache '1,719,766' 581.4743780591276 '±0.28%' 8598832
Regex '2,097,416' 476.7770937359666 '±2.70%' 10487081
Current with object assign cost '2,674,783' 373.8620109610369 '±0.24%' 13373919
Looping with rebuild '928,875' 1076.5706056596546 '±0.12%' 4644378
Looping with rebuild with separate cache - Object.assign '877,969' 1138.9922066567713 '±0.15%' 4389846
Looping with rebuild with separate cache - spread operator '161,892' 6176.9242479398645 '±1.16%' 809465

(That darn spread operator 😓)

Now the actual logging (exact same logger class as Z2M, just made to stand alone), 'console' only as output ('file' introduces too much randomness). Bench with 5s runtime, 1s warmup. Each run calls logger.info 8 times, with 6 filtered out.

Task Name ops/sec Average Time (ns) Margin Samples
Current namespaced '1,847' 541393.4846830966 '±2.53%' 9236
Looping namespaced '1,876' 532841.1613888753 '±2.87%' 9384
Looping namespaced with cache reset in middle '1,846' 541639.9772493636 '±2.68%' 9232

@ghoz
Copy link
Contributor Author

ghoz commented Jun 3, 2024

Have you tried changing a level during a time of chatty behavior (or maybe try changing the log_level in Logs page at a fast pace, to stress-test a bit)? What I'm more worried about is that instant where the cache starts to rebuild and it's going to block each time a new namespace comes up until all are attributed a level (and the first few are likely to block the most).

I didn't go as far as script log level changes requests in a fast manner, but with "quick" manual changes, I didn't see any adverse effect...

lib/util/logger.ts Show resolved Hide resolved
lib/util/logger.ts Show resolved Hide resolved
@Nerivec
Copy link
Collaborator

Nerivec commented Jun 3, 2024

@Koenkk Can you see any usage issue with this automatic hierarchical filtering? It would prevent setting only an upper namespace to a specific level, without affecting below namespaces (would require manually setting the below namespaces, basically reverses current behavior).

@Koenkk
Copy link
Owner

Koenkk commented Jun 3, 2024

@Nerivec I don't see any issues with that, PR also looks OK so let me know if this can be merged.

Edit: only thing left is a PR for a docs update.

ghoz added a commit to ghoz/zigbee2mqtt.io that referenced this pull request Jun 3, 2024
ghoz added a commit to ghoz/zigbee2mqtt.io that referenced this pull request Jun 3, 2024
@ghoz
Copy link
Contributor Author

ghoz commented Jun 3, 2024

only thing left is a PR for a docs update.

done: Koenkk/zigbee2mqtt.io#2786

@ghoz ghoz force-pushed the feat-log-namespace-hierarchy branch from b6fef7e to 41fb6f3 Compare June 3, 2024 22:31
ghoz and others added 5 commits June 4, 2024 01:59
matches the closes defined namespace log level:

advanced:
  log_namespaced_levels:
    'zh:zstack:unpi': info

would also apply to zh:zstack:unpi:write and zh:zstack:unpi:parse

To limit runtime penalty uses the namespacedLevels as a lazy cache
Avoid poluting the namespacedLevel in the config / UI
Add cache reset on config time log level change
@Nerivec Nerivec force-pushed the feat-log-namespace-hierarchy branch from 41fb6f3 to 803c76b Compare June 3, 2024 23:59
@Nerivec
Copy link
Collaborator

Nerivec commented Jun 4, 2024

Fix for #22935 will use winston hotpath and avoid splat logic entirely (/%[scdjifoO%]/g). Seems to be a bug/weirdness in winston (doesn't pass meta), but for z2m's case, might as well use the hotpath.

@Koenkk Koenkk merged commit 881037c into Koenkk:dev Jun 4, 2024
11 checks passed
@Koenkk
Copy link
Owner

Koenkk commented Jun 4, 2024

Thanks!

Koenkk pushed a commit to Koenkk/zigbee2mqtt.io that referenced this pull request Jun 4, 2024
* mention the hierarchical namespace for logging

Documents Koenkk/zigbee2mqtt#22859

* fix wording according to review
ghoz added a commit to ghoz/zigbee2mqtt that referenced this pull request Jun 14, 2024
* POC: Allow hierarchical filtering of logger namespaces

matches the closes defined namespace log level:

advanced:
  log_namespaced_levels:
    'zh:zstack:unpi': info

would also apply to zh:zstack:unpi:write and zh:zstack:unpi:parse

To limit runtime penalty uses the namespacedLevels as a lazy cache

* Move namespace levels to separate cache

Avoid poluting the namespacedLevel in the config / UI
Add cache reset on config time log level change

* Actually use the cache

* Optimize/Cleanup

* Add some namespace hierarchy filter tests

* More coverage.

* Fix Koenkk#22935

---------

Co-authored-by: Nerivec <62446222+Nerivec@users.noreply.github.com>
@@ -102,9 +106,6 @@ class Logger {

const options: KeyValue = {
app_name: 'Zigbee2MQTT',
format: winston.format.printf(/* istanbul ignore next */(info) => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed? I think this causes #23241

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it might not output the message without expressly returning it with printf?

format: winston.format.printf((info) => info.message),

Strange that it's undefined though, I would think it would be the stringified json of info param as default if not message.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghoz could you comment on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but I would not have much of interest to say ...
I see it has been removed when going for the winston hotpath directly
827e18e#diff-be38649fab0f205ab799e4de3d3b56ad7d54397f525e8f370cda402efcece524L109
but I'm not familiar with this low level to be honest...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a syslog enabled system to test it out, but I'd suggest reverting this hunk...
@Nerivec do you remember why it was removed ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. It works using:

            const options: KeyValue = {
                app_name: 'Zigbee2MQTT',
                format: winston.format.printf((info) => info.message),
                ...settings.get().advanced.log_syslog,
            };

Not sure why it doesn't output json by default (might be the combine during logger init), but since we don't want json anyway, definitely a bit too much "cleanup" in my last commit here 😅
We might want to add a test to cover this codepath (would have to hijack way above syslog though I guess).

@Nerivec Nerivec mentioned this pull request Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants